reward conditioning

酬尝交替学习